DeepConversion: Voice conversion with limited parallel training data
نویسندگان
چکیده
منابع مشابه
Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data
Voice conversion (VC) is a technique aiming to mapping the individuality of a source speaker to that of a target speaker, wherein Gaussian mixture model (GMM) based methods are evidently prevalent. Despite their wide use, two major problems remains to be resolved, i.e., over-smoothing and over-fitting. The latter one arises naturally when the structure of model is too complicated given limited ...
متن کاملVoice Conversion Using Exclusively Unaligned Training Data
Although all conventional voice conversion approaches require equivalent training utterances of source and target speaker, several recently proposed applications call for breaking this demand. In this paper, we present an algorithm which finds corresponding time frames within unaligned training data. The performance of this algorithm is tested by means of a voice conversion framework based on l...
متن کاملParallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
We propose a parallel-data-free voice conversion (VC)method that can learn a mapping from source to target speech without relying on parallel data. The proposed method is generalpurpose, high quality, and parallel-data-free, which works without any extra data, modules, or alignment procedure. It is also noteworthy that it avoids over-smoothing, which occurs in many conventional statistical mode...
متن کاملText-independent F0 transformation with non-parallel data for voice conversion
In voice conversion, a simple frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplicity and text-independenc...
متن کاملA KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences
We extend our recently proposed approach to cross-lingual TTS training to voice conversion, without using parallel training sentences. It employs Speaker Independent, Deep Neural Net (SIDNN) ASR to equalize the difference between source and target speakers and Kullback-Leibler Divergence (KLD) to convert spectral parameters probabilistically in the phonetic space via ASR senone posterior probab...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Speech Communication
سال: 2020
ISSN: 0167-6393
DOI: 10.1016/j.specom.2020.05.004